Theory¶

Linear Transformations and Basis Vectors¶

Every matrix represents a linear transformation on a vector space. Usually we represent vector as a linear combination of the unit vectors $\hat{i}, \ \hat{j} \ \text{and} \ \hat{k}$. Hence this basis is written as $\begin{bmatrix} 1 \ \ 0 \ \ 0 \\ 0 \ \ 1 \ \ 0 \\ 0 \ \ 0 \ \ 1 \\ \end{bmatrix} $ (the identity matrix). Let's understand this with an example.
Suppose we have a basis $A$ given by the vectors $(2\hat{i}+3\hat{j})$ and $(4\hat{i}-\hat{j})$ and we want to represent the vector $\vec{v} = (3\hat{i}+2\hat{j})$ in basis $A$.
Let $\vec{v}$ be represented as $<X,Y>$ in basis $A$. $$ \therefore X(2\hat{i}+3\hat{j})+Y(4\hat{i}-\hat{j}) = (3\hat{i}+2\hat{j})\\ \implies (2X+4Y)\hat{i}+(3X-Y)\hat{j} = (3\hat{i}+2\hat{j}) $$

This can also be written as

$$ \begin{bmatrix} 2 & 4 \\ 3 & -1 \\ \end{bmatrix} \begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} 3 \\ 2 \end{bmatrix} \\ \implies \begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 3 & -1 \\ \end{bmatrix}^{-1} \begin{bmatrix} 3 \\ 2 \end{bmatrix} = \begin{bmatrix} 0.79 \\ 0.36 \end{bmatrix} $$ So the vector $\vec{v}$ represented in basis $A$ as $<0.79, 0.36>$.


Eigen values & vectors¶

A general vector changes both direction and magnitude when transformed by a matrix. But some special vectors only get stretched or compressed, without changing direction. Such vectors are eigen-vectors. And the scalar by which they get compressed or extended is called eien-value. Formally: $$ Av=\lambda v $$

  • $v\neq 0$ is an eigenvector.
  • $\lambda$ is its eigenvalue.

This equation shows:

The transformation acts like scaling along that vector.

Eigenvectors provide the natural coordinate axes in which the transformation becomes easiest to describe.


How to calculate eigen values & vectors?¶

Observe equation 3. $$Av=\lambda v\implies Av-\lambda v=0\\\implies (A-\lambda I)v=0\implies \text{det}(A-\lambda I)=0\\\therefore \begin{bmatrix} (x_{11}-\lambda) & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & (x_{22}-\lambda) & x_{23} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & x_{m3} & \dots & (x_{mm}-\lambda) \end{bmatrix}=0$$

On expanding this, we get the characteristic polynomial of the matrix. On solving this characteristic polynomial, we get the eigen-values of the matrix. Each eigen-value corresponds to an unique eigen vector.

How to find the characteristic polynomial?¶

Let $A$ be an $n \times n$ matrix. The characteristic polynomial is: $$ p(\lambda) = \det(\lambda I - A) = \lambda^{n} + c_{1}\lambda^{n-1} + c_{2}\lambda^{n-2} + \cdots + c_{n} $$ We compute the coefficients $c_k$ using trace and principal minors.

Step-by-Step Algorithm¶

Step-1 — Initialize the polynomial¶

Start with: $c_0 = 1$ because the leading term of the characteristic polynomial is always $\lambda^n$.

Step-2 — Compute the trace for the first coefficient¶

The coefficient of $\lambda^{n-1}$ is: $c_1 = -\mathrm{trace}(A)$

This is the negative sum of the diagonal elements of $A$, which is equal to the sum of all of its eigen-vectors.

Step-3 — Compute principal minors of higher orders¶

For every $k = 2, 3, \ldots, n-1$:

  1. Take all combinations of $k$ distinct row/column indices $(i_1, i_2, \ldots, i_k)$
  2. For each such combination, form the principal minor.
  3. Compute the determinant of each such $k\times k$ minor.
  4. Add all of them together. This is coefficient of $\lambda^k$.

Step-4 — Compute the determinant of the entire matrix¶

The last coefficient (constant term) is the product of all the eigen-vectors $=$ determinant of the entire matrix.


Function to compute determinant¶

In [1]:
from typing import List

def determinant(matrix: List[List[float]]) -> float:
    if len(matrix) != len(matrix[0]):
        raise ArithmeticError("Parameter must be a square matrix")
    if len(matrix) == 1:
        return matrix[0]
    if len(matrix) == 2:
        return matrix[0][0]*matrix[1][1]-matrix[0][1]*matrix[1][0]
    
    factor = 1.0
    
    # copying the contents of matrix in a seperate array
    x = [row[:] for row in matrix]
    
    while len(x)>2:
        # in case the first element of the matrix is 0
        # we swap the row with another row whose first element is not 0
        if x[0][0] == 0.0:
            for i in range(len(x)):

                # if we find such a row, we swap it with the first row and the factor becomes -1
                if x[i][0] != 0:
                    for j in range(len(x)):
                        temp = x[i][j]
                        x[i][j] = x[0][j]
                        x[0][j] = temp
                    factor *= -1.0
                    break

                # if we reach the end of the matrix and still don't find a suitable row, we return 0
                if x[i][0] == 0 and i==len(x)-1:
                    return 0
        
        factor *= x[0][0]
        arr = []
        for i in range(1,len(x)):
            tempArr = []
            for j in range(1,len(x)):
                tempArr.append(x[i][j]-x[0][j]*x[i][0]/x[0][0])
            arr.append(tempArr[:])
        x = [row[:] for row in arr]
        
    # at the end of the loop the size of x is always 2, so we can directly compute the result
    return factor*(x[0][0]*x[1][1]-x[0][1]*x[1][0])

Function to calculate the characteristic polynomial¶

In [2]:
def minor(matrix, i, j):
    """Return the minor of the matrix after removing row i and column j."""
    return [row[:j] + row[j+1:] for idx, row in enumerate(matrix) if idx != i]

def characteristic_polynomial(matrix):
    """
    Generates the characteristic polynomial coefficients of a matrix.
    Returns coefficients for:
        det(λI - A) = λ^n + c1 λ^(n-1) + ... + cn
    Where:
        c1 = -trace(A)
        cn = (-1)^n det(A)
    """
    n = len(matrix)

    # λ^n coefficient is always 1
    coeffs = [1.0]

    # c1 = -trace
    trace = float(sum(matrix[i][i] for i in range(n)))
    coeffs.append(-trace)

    # Remaining coefficients require computing principal minors of each order
    # a_k = (–1)^k * (sum of all principal minors of order k)
    for k in range(2, n+1):
        from itertools import combinations

        total = 0.0
        for rows in combinations(range(n), k):
            sub = []
            for r in rows:
                sub.append([matrix[r][c] for c in rows])
            total += determinant(sub)

        coeffs.append(((-1)**k) * total)

    return coeffs

Let's take a simple example. Find the characteristic polynomial of the matrix $A=\begin{bmatrix} 5 & 4 & 2\\ 1 & 3 & 1\\ 2 & 1 & 5\\ \end{bmatrix}$.

In [3]:
A = [
    [5.0, 4.0, 2.0],
    [1.0, 3.0, 1.0],
    [2.0, 1.0, 5.0]
]

characteristic_polynomial(A)
Out[3]:
[1.0, -13.0, 46.0, -48.00000000000001]

$\therefore$ The characteristic polynomial is $\lambda^3-13\lambda^2+46\lambda-48=0$.
On solving this equation, we get the eigen values as $\lambda_1=2, \lambda_2=3, \lambda_3=8$.

How to find eigen-vectors from eigen-values?¶

  • For any given eigen-value $\lambda$, compute $A-\lambda I$.
  • Multiply the matrices $A-\lambda I$ and $[x, y, z]^T$.

Example¶

Let's consider the eigen value $\lambda_2=3$. $ \\\therefore \begin{bmatrix} 5-3 & 4 & 2\\ 1 & 3-3 & 1\\ 2 & 1 & 5-3\\ \end{bmatrix} = \begin{bmatrix} 2 & 4 & 2\\ 1 & 0 & 1\\ 2 & 1 & 2\\ \end{bmatrix} \\\therefore \begin{bmatrix} 2 & 4 & 2\\ 1 & 0 & 1\\ 2 & 1 & 2\\ \end{bmatrix}\begin{bmatrix} x\\ y\\ z\\ \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0\\ \end{bmatrix} $
From here we get 2 equations.

  • $x+2y+z=0$
  • $x+z=0$

Any vector satisfying any of these equations is the eigen-vector that corresponds to the eigen-value 3 for the basis $A$.